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SUMMARY 


In causal mediation analysis, the definitions of the natural direct and indirect effects involve potential 
outcomes that can never be observed, so-called a priori counterfactuals. This conceptual challenge trans- 
lates into issues in identification, which requires strong and often unverifiable assumptions, including 
sequential ignorability. Alternatively, we can deal with post-treatment variables using the principal strati- 
fication framework, where causal effects are defined as comparisons of observable potential outcomes. We 
establish a novel bridge between mediation analysis and principal stratification, which helps to clarify and 
weaken the commonly-used identifying assumptions for natural direct and indirect effects. Using princi- 
pal stratification, we show how sequential ignorability extrapolates from observable potential outcomes 
to a priori counterfactuals, and propose alternative weaker principal ignorability-type assumptions. We 
illustrate the key concepts using a clinical trial. 


Some key words: Causal inference; Identification; Potential outcome; Principal stratification 


1. INTRODUCTION 


Mediation analyses decompose causal effects into channeled effects through some mediator that lies in 
the pathway between the treatment and the outcome, and un-channeled effects not through this mediator. 
We define channeled and un-channeled effects using the concepts of natural direct and indirect effects. 
The latter effects raise identifiability issues because they are defined as comparisons between potential 
outcomes of various types, on some of which data contain no or little information without strong assump- 
tions. Inferences on these effects usually rest on sequential ignorability, which combines ignorability of 
treatment assignment given a set of pre-treatment covariates and ignorability of the mediator given the 
treatment and pre-treatment covariates (Robins & Greenland, 1992). Under sequential ignorability, natu- 
ral direct and indirect effects can be identified from the data using the mediation formula (Pearl, 2001). 

Sequential ignorability implies that, conditional on covariates, there is no unmeasured confounding 
of the treatment-mediator, treatment-outcome and mediator-outcome relationships. Therefore, these as- 
sumptions require that the mediator be, at least in principle, regarded as an additional treatment and could 
be potentially manipulated by an intervention. Sequential ignorability is not directly verifiable from the 
observed data and its plausibility is not always well understood. 

We provide insight into sequential ignorability using the concepts of principal stratification (Frangakis 
& Rubin, 2002) and principal ignorability (Jo & Stuart, 2009; Ding & Lu, 2017) in the case of a binary 
mediator. We make the following contributions. First, we use principal ignorability to offer an alternative 
interpretation of sequential ignorability, which may seem more natural in some settings. Second, we use 
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principal stratification to clarify the source of information on natural direct and indirect effects under 
sequential ignorability. Third, we elucidate the relationship between sequential and principal ignorability 
under an additional monotonicity assumption. Fourth, we propose a new set of assumptions to identify 
natural direct and indirect effects, and investigate their relationships with sequential ignorability. 


2. NOTATION, FRAMEWORK AND IDENTIFIABILITY IN MEDIATION ANALYSIS 
2:1. Potential outcomes and causal effects 


For each individual 7 characterized by covariates X;, let Z; represent a binary treatment, with 7; = 1 
for those assigned to the active treatment and Z; = 0 for those assigned to the control. Let Y;(z) and 
M,(z) be the potential outcomes for a primary endpoint, Y, and a binary post-treatment intermediate 
variable, 1/7, we would observe under treatment level z (z = 0, 1) for unit 7. In mediation analysis, M/ is 
referred to as a mediator. 

For each unit 7 the observed data include covariates X;, the treatment Z;, and the observed values of 
the mediator and outcome, which can be defined, by consistency, as M°°s = M;(Z;) = Z;M;(1) + (1 — 
Z;)M;(0) and vere = Y;(Z;) = Z;,Y;(1) + (1 — Z;)Y; (0). 

The purpose of mediation analysis is to investigate the extent to which the mediator plays a role in 
the effect of the treatment on the outcome. To formalize causal effects that can answer such a question, 
Robins & Greenland (1992) and Pearl (2001) extended the above potential outcomes by introducing the 
double-indexed notation Y;(z,m), which denotes the potential outcome for unit i that would occur if 
the treatment were set to level z, and if the mediator were manipulated to level m. Furthermore, we can 
define an additional potential outcome, Y;(z, M;."), where the level of the mediator is determined by 
an intervention on the treatment. If z’ = z, then Y;(z) = Yi(z, M;-) under the composition assumption 
(VanderWeele, 2015). We use /;, for M;(z) in the nested potential outcomes. 

The average causal effect conditional on covariates at level X; = x, ACE(x) = E{Y;(1) — Y;(0) | x}, 
can be decomposed into the sum of a natural direct effect, 


NDE(z | x) = E{Yi(1, Miz) — ¥:(0, Miz) |}, (2 =0,1) (1) 
and a natural indirect effect, 


as ACE(x) = NDE(z | x) + NIE(1 — z | x) (Robins & Greenland, 1992; Pearl, 2001). The natural di- 
rect effect NDE(z | x) is the average effect of the treatment when the mediator is kept at the level that 
would potentially be observed under treatment z, and the natural indirect effect NIE(z | x) is the average 
effect of a change in the mediator, achieved by a hypothetical intervention that sets the treatment to level 
z. All the effects are defined conditional on covariates. 

Throughout the paper, we use a randomized clinical trial, the morphine study (Borracci et al., 2013), to 
convey the intuition behind the assumptions and illustrate how one can reason about their plausibility. 


Example 1. Baccini et al. (2017) analyzed the morphine study to assess the extent to which the effect 
of preoperative oral administration of morphine sulphate on post-operative pain intensity is mediated 
by post-operative self administration of intravenous morphine sulphate by patients. A sample of patients 
undergoing an elective open colon-rectal abdominal surgery were randomly assigned to receive either oral 
morphine sulphate, Z; = 1, or oral midazolam, 7; = 0. The control is an active placebo with a sedative 
effect. For each patient, we observe gender and age. For patient 2 under treatment z, the potential outcome 
Y;(z) is the value of post-operative pain intensity, and M/;(z) is a binary indicator equal to 1 or 0 if the 
patient self-administered a low or high level of morphine sulphate after surgery. Moreover, Y;(z,m) and 
Y;(z, Mj.) denote the values of post-operative pain intensity for patient 2 that would occur if his/her 
treatment was set to level z, and her/his post-operative morphine consumption was manipulated to levels 
mand M;(z’), respectively. 
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2:2. Identification issues and sequential ignorability 


Potential outcomes of the form Y;(z, M;-’), with z 4 z’, are referred to as cross-world counterfactuals 
(Robins & Greenland, 1992) or a priori counterfactuals (Frangakis & Rubin, 2002). They can never be 
observed in one experiment, because they result from hypothetically assigning each unit to two different 
treatments simultaneously (Mealli & Mattei, 2012; Forastiere et al., 2016). Although we can hypothesize 
their existence, a priori counterfactuals are conceptually different from potential outcomes of the form 
Y;(z), which are observable potential outcomes. The potential outcome Y;(z, Mj.) is observable only if 
either z = 2’ or M;(z) = M;(z’), ie., Yi(z) = Yi(z, Miz) = Yi(z, Miz’), and is actually observed when 
the treatment received by unit i is Z; = z = 2’. Although ignorability of the treatment suffices to identify 
the marginal distributions of potential outcomes of the form Y;(z), and hence the average causal effect, 
ACE(z), identification of the marginal distributions of a priori counterfactuals, and hence of natural direct 
and indirect effects, requires additional assumptions that would allow extrapolation to a priori counterfac- 
tuals based on the observed data. 

There are different sets of identifying assumptions for the natural direct and indirect effects (Pearl, 2001; 
Van Der Laan & Petersen, 2008; Hafeman & VanderWeele, 2011; Imai, Keele & Yamamoto, 2010). Ten 
Have & Joffe (2012) provides a review. The difference between them is subtle and, broadly speaking, they 
all couple the ignorability of the treatment with the ignorability of the mediator conditional on covariates. 
Here we focus on the assumptions used by Imai, Keele & Yamamoto (2010): 


Assumption | (Ignorability of the treatment). {Y;(z,m), M;i(z’)}1L Z; | X; for all z, 2’, m = 0,1; 
Assumption 2 (Ignorability of the mediator). Y;(z,m)4L M;(z’) | (2; = 2’, X;) for all z, 2’,m = 0,1. 


Imai, Keele & Yamamoto (2010) refer to Assumptions 1 and 2 together as sequential ignorability. 
Assumption | is the ignorability of the treatment, and Assumption 2 states that the mediator is ignorable 
given the observed treatment and covariates. Under Assumptions 1| and 2, 


EAYi(z, Mir) |a}= D) EYP | Zi = 2, MP = myx) x pr(Me* =m | Zi =2',2),  @) 


m=0,1 


which is referred to as the mediation formula (Pearl, 2001). We see from (3) that the average of the 
potential outcome Y;(z, M;,’) can be identified from the observed data by the conditional expectation of 
the observed outcomes given treatment level z, averaged over the conditional distribution of the observed 
mediator given treatment level z’. 


2:3. Principal stratification 


Frangakis & Rubin (2002) introduced the principal stratification framework to deal with post-treatment 
variables. A principal stratification with respect to a post-treatment variable / is a partition of units into 
latent subpopulations, called principal strata, defined by the joint potential values of that post-treatment 
variable under each level of the treatment. Denote by G; = {M;(0), M;(1)} the principal strata mem- 
bership. Given a binary mediator, G; € {00,01, 10,11}. In Example 1, we call G; = 00 pain-intolerant 
patients, G; = 01 compliant patients, G; = 10 defiant patients, and G; = 11 pain-tolerant patients. 

A principal causal effect is a comparison between the potential outcomes within a particular princi- 
pal stratum. We focus on average principal causal effects, defined as PCE(g | x) = E{Yj(1) — Y;(0) | 
G; = g, x}. The average causal effect is a weighted average of the principal causal effects ACE(x) = 
2, PCE(g | ©)7g\z, where the summation is over g € {00, 01, 10,11}, and 7g). = pr(Gi = g | x) is 
the conditional probability of the principal stratum g. Frangakis & Rubin (2002) call PCE(11 | «) and 
PCE(00 | x) dissociative effects. These subgroups, for which the mediator is not affected by the treatment, 
provide information on the natural direct effect of the treatment. They call PCE(01 | 7) and PCE(10 | «r) 
associative effects. These subgroups, for which the mediator is affected by treatment, generally combine 
natural direct and indirect effects (Mealli & Mattei, 2012). See VanderWeele (2008) for more discussions. 

The principal strata membership is in general unknown, as we cannot observe both potential values 
of the mediator in a single experiment. This inherent latent nature of principal strata jeopardizes the 
identification of principal causal effects without additional assumptions. 
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3. GENERALIZED STRONG PRINCIPAL IGNORABILITY AND THE MEDIATION FORMULA 


Principal ignorability was introduced for the identification of principal causal effects (Jo & Stuart, 2009; 
Ding & Lu, 2017; Feller et al., 2016). Here, we generalize it for mediation analysis: 


Assumption 3 (Generalized strong principal ignorability). Y;(z,m)1.G; | X; for all z,m = 0,1. 


Assumption 3 requires that the distribution of potential outcomes Y;(z, 7m) be the same across principal 
strata, conditional on covariates. Because the heterogeneity across principal strata can be interpreted as 
heterogeneity with respect to a latent variable (Forcina, 2006), Assumption 3 can also be seen as ruling 
out the presence of unmeasured confounding of the mediator-outcome relationship (Ding & Lu, 2017). In 
the following, we present results that help to clarify the relationship between Assumptions 2 and 3. While 
the former involves marginal independence between the potential outcomes and the two potential values 
of the mediator, the latter assumes joint independence. Therefore, Assumption 3 implies Assumption 2. 
Thus, there can be situations where principal strata are heterogeneous, i.e., Assumption 3 does not hold, 
but Assumption 2 holds. Even if the joint distribution of 14;(0) and M/;(1) depends on a latent variable 
also affecting the outcome, the marginal distribution of the two potential mediators might be free of 
unmeasured confounding. Then, the proposition below follows. 


PROPOSITION |. Under Assumptions | and 3, the mediation formula (3) holds. 


Proposition | states that the average of a priori counterfactuals can be identified from the observed data 
in the same way, that is, by the mediation formula (3), under either Assumptions | and 2 or Assumptions | 
and 3. Although Assumption 3 is stronger than Assumption 2, in some cases the plausibility of Assumption 
3 might be easier to justify, because it can help to think in terms of homogeneity across principal strata 
rather than in terms of no unmeasured confounding of the mediator-outcome relationship. 

In Example 1, Assumption 2 requires that, at least in principle, we can conceive an intervention on post- 
operative morphine consumption, and assume that it is randomly assigned within each treatment group, 
conditional on covariates. Thus, Assumption 2 rules out unobserved confounders that causally affect both 
post-operative morphine consumption and pain intensity given the treatment and pretreatment covariates. 
Although hypothetical interventions on post-operative morphine consumption might be conceivable, they 
might be unethical. Moreover, it might be difficult to argue that all relevant confounders of the relationship 
between post-operative morphine consumption and pain intensity have been observed, especially in the 
morphine study with only two covariates. It might be easier to envision the plausibility of Assumption 3, 
which requires that the potential outcomes for pain intensity that would occur if the treatment were set 
to level z and the post-operative morphine consumption were set to level m have the same distributions 
across pain-tolerant, pain-intolerant, compliant and defiant patients with the same value of the covariates. 


4. INTERPRETATION OF THE MEDIATION FORMULA: EXTRAPOLATION ACROSS PRINCIPAL STRATA 


We aim at clarifying the extrapolation of information on a priori counterfactuals performed by the 
mediation formula (3). In principle, the average potential outcome is a weighted average of the same 
potential outcome across principal strata, with weights given by principal strata proportions. The following 
proposition shows what part of the observed data and which type of units provide information on potential 
outcomes Y;(z, Mj”), which can be a priori counterfactuals for some units if z 4 2’. 


PROPOSITION 2. Under Assumptions |, if either Assumption 2 or 3 holds, then 


E{Yi(1, Mio) | } = (4) 
TOO|x T10|x 
E{Y;(1) | G; = 00, x Be 7 ACRE CH nie | ee el Ce 
leona | } ae + BM) | be | (rene + a) 
TMAla TO1la 
+ | E{Y,(1) |G; = 11,2 BIVENS Og a | a cterais 
leona | je + BH) | } He] (moe + art) 
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EAY;{(0, Mir) | t} = (5) 
T11lax TM10\a 
[et¥i(0) | Gs = 14,2} — "HE + {¥4(0) | Gs = 10,2} — 92 —| (mone + a) 
T0O|x TO1 a 
+ | E{Y;(0) | G; = 00, x Fe oe AL eee ee ee bc 
[e0%(0) | } + BH (0) | } ete] (me + mol) 


Each term of (4) and (5) is a product of a weighted average of an observable potential outcome, Y;(1) 
or Y;(0), and the sum of the proportion of two principal strata. This product reflects how information 
on observable potential outcomes for specific principal strata is used for potential outcomes of the type 
Y;(z, Mj.) for other principal strata. 

In Example 1, according to (4), a weighted average of the observable potential outcomes for pain in- 
tensity under oral morphine, Y;(1), for patients with 1/;(1) = 0, who would self-administer a high level 
of morphine sulphate, i.e., pain-intolerant patients G; = 00 and defiant patients G; = 10, provides infor- 
mation on Y;(1, Mio) for patients with M;(0) = 0, who would self-administer a high level of morphine 
sulphate under the placebo, i.e., compliant patients G; = 01 and pain-intolerant patients G; = 00. More- 
over, the distributions of Y;(1) for patients with M/;(1) = 1, ie., pain-tolerant patients G; = 11 and com- 
pliant patients G; = 01, are used to impute Y;(1, Mio) for patients with 17;(0) = 1, ie., defiant patients 
G; = 10 and pain-tolerant patients G; = 11. A similar interpretation applies to (5). 

Proposition 2 also provides valuable insights into the meaning of the natural indirect effects. Specifi- 
cally, we have the following propositions, in which we use ACE,y(x) = E{M;(1) — M;(0) | x} to de- 
note the conditional average causal effect of the treatment on the mediation for notational simplicity. 


PROPOSITION 3. Under Assumption 1, if either Assumption 2 or 3 holds, then 
NIE(1 | xz) = ACEy(a) x [E{Y;(1) | G; = 11 or 01,2} — E{Y;(1) | G; = 00 or 10,2}], (6) 
NIE(0 | x) = ACEyy (x) x [E{Y;(0) | G; = 11 or 10,2} — E{Y;(0) | G; = 00 or 01,z}]. (7) 


Proposition 3 decomposes the natural indirect effects into products the average effect of the treatment 
on the mediator and a comparison of potential outcomes across different principal strata. 

Under Assumptions | and 2, if we further introduce homogeneity assumptions of the potential outcome 
distributions across principal strata, then the second terms on the right-hand sides of (6) and (7) can be 
interpreted as the average causal effects of the mediator on the outcome. 


PROPOSITION 4. Suppose Assumptions | and 2 hold. If Y;(1,m)1L G; | X;, then 
NIE(1 | x) = ACEys(x) x F{Y;(1,1) — ¥;(1, 0) | xv}. (8) 
If Y;(0,m)1LG; | Xi, then 
NIE(0 | x) = ACEus(2) x E{Y;(0,1) — Y;(0,0) |e}. (9) 


The independence assumption Y;(z,m)11 G; | X; for a fixed value of z is implied by Assumption 3, so 
both (8) and (9) hold under Assumptions | and 3. Formulas (8) and (9) reflect the intuition of mediation: 
the treatment affects the mediator, and then the mediator affects the outcome given the treatment level 
Z;, = z with either z = 0orz=1. 


5. MONOTONICITY IN MEDIATION ANALYSIS 
We now investigate the role of monotonicity in mediation analysis: 
Assumption 4 (Monotonicity). M;(1) > M;(0) for all ¢. 


Assumption 4 rules out negative effects of the treatment on the mediator, but an alternative version 
of monotonicity, ruling out positive effects of the treatment on the mediator, could be considered. The 
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plausibility of monotonicity in mediation analysis strongly depends on the substantive setting. In Example 
1, monotonicity, ruling out the existence of defiant patients with G; = 10, is likely plausible due to the 
pharmacological characteristics of the active placebo under control. See also Baccini et al. (2017). 

When the treatment and the mediator are both binary, the following proposition holds under the mono- 
tonicity in Assumption 4. 


PROPOSITION 5. Under Assumptions | and 4, Assumptions 2 and 3 are equivalent. 


Proposition 5 implies that, under ignorability of treatment assignment and monotonicity, sequential ig- 
norability and strong principal ignorability are equivalent, so we can use the mediation formula in (3) to 
identify and estimate natural direct and indirect effects invoking either Assumption 2 and 3, whichever 
is easier to justify in a specific case study. In Example 1, Assumption 1 holds by design and Assump- 
tion 4 is very plausible. Therefore, we can identify the natural direct and indirect effects using (3), if we 
can provide convincing arguments on the plausibility of either Assumption 2, i.e., no unmeasured con- 
founding between the morphine consumption and pain intensity, or Assumption 3, i.e., homogeneity of 
the distributions of the potential outcomes across pain-tolerant, pain-intolerant, and compliant patients. 


6. IDENTIFICATION UNDER GENERALIZED WEAK PRINCIPAL IGNORABILITY 


Here we propose a set of alternative assumptions for identification of natural direct and indirect effects, 
involving generalizations of weak principal ignorability assumptions (Jo & Stuart, 2009; Ding & Lu, 
2017; Feller et al., 2016) to potential outcomes of the form Y;(z,m): 


Assumption 5. Y;(1, 1)1L 4;(0) | {44;(1) = 1, Xi}; 
Assumption 6. Y;(1,0)1L 14;(1) | {44;(0) = 0, X;}. 


Assumption 5 is a generalized weak principal ignorability of Y;(1,1) across strata G; = 11 and G; = 
01, and Assumption 6 is a generalized weak principal ignorability of Y;(1,0) across strata G; = 00 and 
G; = 01. Assumptions 5 and 6 together are weaker than Assumption 3, because the independence in 
Assumptions 5 and 6 refers to specific potential outcomes and are conditional on specific values of 1/4; (0) 
and M;(1). 

In general, we cannot rank sequential ignorability and Assumptions 5 and 6. However, when the treat- 
ment and the mediator are both binary, relying on Proposition 5, we have the following result. 


PROPOSITION 6. Under Assumptions | and 4, Assumption 2 implies Assumptions 5 and 6. 


Proposition 6 implies that the set of Assumptions 1, 4, 5 and 6 is weaker than the set of Assumptions 1, 
4 and 2 or 3, and thus may be more plausible. Therefore, it might be valuable to investigate whether we 
can identify natural direct and indirect effects under Assumptions 1, 4, 5 and 6. 

Assumptions 5 and 6 involve homogeneity of two different potential outcomes, Y;(1, 1) and Y;(1, 0), 
across two different sets of principal strata. In particular, Assumption 5 states that the distribution of 
Y;(1, 1) is the same for strata G; = 11 and G; = 01, ie., pain-tolerant and compliant patients for whom 
Y;(1, 1) = Y;(1, Mi1) = Y;(1). Assumption 5 implies that we can use the observed data to estimate the 
distribution of Y;(1, 1) for the two principal strata that are mixed together in the observed set with Z; = 1 
and M°"s = 1, i.e., patients who are treated with preoperative oral morphine and who self-administer a 
low level of morphine sulphate after surgery. 

The second homogeneity in Assumption 6 refers to the potential outcome Y;(1, 0) across strata G; = 00 
and G'; = 01, ie., pain-intolerant and compliant patients for whom Y;(1,0) = Y;(1, Mio). This homo- 
geneity has a slightly different flavor, because it allows for identifying the a priori counterfactual for 
compliant patients G; = 01 using information of Y;(1,0) for pain-intolerant patients G; = 00. Under 
Assumptions | and 4, we can estimate the distribution of Y;(1,0) for G; = 00 using information of the 
observed outcome for units with Z; = 1 and M°"s = 0, i.e., patients who are treated with preoperative 
oral morphine and who self-administer a high level of morphine sulphate after surgery. 

We formalize these arguments in the following proposition. 
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PROPOSITION 7. If Assumptions 1, 4, 5 and 6 hold, then 


E{Y;(1, Mio) |2}= > EB(¥p* | Z, = 1,Mp =m, 2) x pr(MP* = m | Z, = 0,2), 
m=0,1 
NDE(0| 2) = 5°) B(¥? | Zj = 1, Mp = m, 2) x pr(Mp* = m | Z = 0,2) 
m=0,1 
—E(Y" |Z; = 0,2), 
NIE(1 | ¢) = {E(¥p" | 2; =1, M2" = 1,2) — E(V2" | Z, = 1, Mo" =0,2)} 
x {E( Me" | Z, = 1,2) — E( Ms | Z; =0,2)}. 


In the Supplementary Material, we give analogous results for NDE(1 | x) and NIE(0 | x). 


7. DISCUSSION 


Generalized strong principal ignorability in Assumption 3 implies ignorability of the mediator in As- 
sumption 2. Proposition 5, however, shows that under monotonicity, the two assumptions are equivalent 
with a binary mediator. This allows us to derive alternative and weaker assumptions to identify natural 
direct and indirect effects, namely the weak principal ignorability in Assumptions 5 and 6. Unfortunately, 
monotonicity, ignorability of the mediator and weak principal ignorability assumptions are not directly 
testable from the observed data, and they may be implausible in some contexts. Therefore, it is valuable 
to think about what we can learn from the data about the causal estimands of interest when some of the 
underlying critical assumptions cannot be invoked. 
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S1. NDE(1 | a) AND NIE(0 | «) UNDER GENERALIZED WEAK PRINCIPAL IGNORABILITY 


In 86 we proposed a set of alternative assumptions for identification of natural direct and indirect effects, 
but focused only on NDE(0 | «) and NIE(1 | x). For completeness, here we derive similar results for 
NDE(1 | x) and NIE(0 | ). 


Assumption S1. Y;(0,0)1L M4;(1) | {44;(0) = 0, X;}. 


Assumption S2. Y;(0,1)1L 4;(0) | {44;(1) = 1, X;}. 


Assumption S1 is the generalized weak principal ignorability of Y;(0,0) across strata G'; = 00 and 
G; = 01, and Assumption S2 is the generalized weak principal ignorability of Y;(0, 1) across strata G; = 
11 and G; = 01. As Assumptions 5 and 6, Assumptions S1 and S2 are weaker than the generalized 
principal ignorability in Assumption 3. Relying on Proposition 5, we have the following result analogous 
to Proposition 6. 


PROPOSITION S1. Under Assumptions | and 4, Assumption 2 implies Assumptions S1 and S2. 


Assumptions S1 and S2 involve homogeneity of two different potential outcomes, Y;(0, 0) and Y;(0, 1), 
across two different sets of principal strata. In particular, Assumption S1 states that the distribution of 
Y;(0, 0) is the same for strata G; = 00 and G; = 01 for whom Y;(0,0) = Y;(0, Mio) = Y;(0). Although 
the distribution of the potential outcome can be identified under Assumption 1, Assumption S1 allows 
estimating from the observed data the distribution of Y;(0, 0) for the two principal strata that are mixed to- 
gether in the observed set with Z; = 1 and MZ he = 0. The second homogeneity in Assumption S2, refers 
to the potential outcome Y;(0,1) across strata G; = 11 and G; = 01 for whom Y;(0,1) = Y;(0, Mit). 
This homogeneity has a slightly different flavor, because it allows for identifying the a priori counter- 
factual for stratum G; = 01 using information of Y;(0,1) for G; = 11, which, in turn, can be estimated 
using information of the observed outcome for units with Z; = 0 and M?*s = 1 under Assumption 1. We 
formalize these arguments below analogous to Proposition 7. 
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PROPOSITION S2. If Assumptions 1, 4, S1 and S2 hold, then 


E{Y;(0,Mix)|2}= S > E(¥ P| Z; = 0, Mp = m, x) x pr(MP* =m | Z = 1,2), 
m=0,1 
NDE(1 |s)= S> BYP” | Z;=1,2) 
m=0,1 
—E(Y2"* | Z; = 0, Mo" = m, 2) x pr(Mo* =m | Z; = 1,2), 
NIE(0 | x) = {E(¥?™ | Z; = 0, Mp’ = 1,2) — E(Yp" | Z; = 0, Mp = 0,2)} 
x {E(M3*s | Z; = 1,2) — E(M3*s | Z, =0,2)}. 


35 S2. PROOFS 
S2-1. Proof of the mediation formula (3) 


We review the proof of the mediation formula (3) under Assumptions | and 2: 


EXY;(z, Miz") |x} = S > E{Y;(z,m) | Mj(2’) =m, 2} x pr{M;(z’) =m | 2} 

m=0,1 

= > E{Y;(z,m) | Zi = 2’, Miz’) = m, 2} x pr{M;(z') =m | 2} 
m=0,1 

= >> E{Y;(z,m) | Z = 2’, 2} x pr{M;(z’) =m | 2} 
m=0,1 

= 5° E{¥i(z,m) | Z = z,2} x pr{M,(2') =m | Z = 2’, 2} 
m=0,1 

= Ss E{Y;(z,m) | Z; = 2, Mi(z) = m, x} x pr{M,(2') =m | Z; = 2’, 2} 
m=0,1 

= S- Bye |Z,= z, Mors =m,x} x pr( Mors =m|Z,=2',2). 
m=0,1 


Assumption 1, ignorability of the treatment, implies Y;(z,m)1LZ; | {M;(z'), Xi} and M;j(z’)1LZ; | Xi, 
and ensures the second and the fourth equalities. Assumption 2, ignorability of the mediator, ensures 

4 the third and fifth equalities. Consistency ensures the last equality with Me’ = M;(Z;) and Y2>° = 
¥i(Zi, Mo). 


$2-2. Proof of Proposition 1: mediation formula (3) under Assumptions | and 3 


Assumptions | and 3 together imply Assumption 2. Therefore, Proposition | follows from the proof in 
Section $2.1. 


45 S2:3. Proof of Proposition 2 


Consider E{Y;(1, Mio) | x}. By consistency, the mediation formula (3) can be re-written in terms of 
potential outcomes as 


F{Y,(1, Mio) |z} = SY) E{Y¥2" | Z; = 1, MP =m, 2} x pr{ Mp =m | Z; = 0,2} 


m=0,1 
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where the last equality follows from Assumption 1. By the law of total probability, each term in the last 
equality can further be written in terms of principal strata. Formally, 
E{Y;(1, Mio) | x} 
= [E{¥;(1) | Mj(0) = 0, M,(1) = 0,2} x pr{Mj(0) = 0 | Mj(1) = 0,2} 
+E{Yi(1) | Mi(0) = 1, Mi(1) = 0, ¢} x pr{M;(0) = 1| Mi(1) = 0,2}] x pr{M;(0) = 0| x} 
FE{¥i(1) | Mi(0) = 0, Mi(1) = 1, a} x pr{ Mi (0) = 0| M;(1) = 1, 2} 


+ B{Yi(1) | Mi(O) = 1, Mi(1) = 1,2} x pr{Mi(0) = 1 | Mi(1) = 1,24] x pr{Mh(0) = 1 | 2} 
ee Too|x eS M10|a 2 e 
= [2040) |G; = 00,2} 7M + BEV) |G, = 10,2} | (ie + rn) 
Sig ee ees 
+ |B) |G, = 01,2) — "4 (A) | Gs = 2} | (mois + mp) 


Similarly, we can prove the result for E{Y;(0, Mi1) | x}. 


82-4. Proof of Proposition 3 
Consider NIE(1 | x). Define 


_ Tile + Moje _  pr(Mp =1| Z; = 0,2) 
M1 \2 + Tole pre 1 Fee La)’ 


et Toole + Toije 1 — pr(Mos = 1| Z; =0,2) 7 pr(Me = 0 | Z; = 0, x) 
Toole Moje 1 —pr(M=1)Z,=1,2)  pr(M* =0| Z, =1,2)) 


The quantities 1/w, and 1/w2 can be interpreted as causal effects of the treatment on the mediator on the 
risk ratio scale. Replacing w, and we in Proposition 2, we have 
E{Y;(1, Mio) | e} = wi x [tyj2L{Vi(1) | Gi = 11,2} + mojeE{Y (1) | Gi = 01, 2}| 
t+we x [moojrZ{Yi(1) | Gi = 00,2} + moj2E{Y (1) | Gi = 10, x}]. 


Therefore, 
NIE(1 | x) = E{Yi(1) | 2} — E{¥i(1, Mio) | x} 
= oe Tg|a EL {Yi (1) | Gi= g, x} 


g=11,01,00,10 
= (wr x [majeE{¥i(1) | Gi = 11,2} + mo1jeB{¥i(1) | Gi = 01, 2}] 
+02 x [roojeE{¥i(1) | @ = 00,22} + moeE{¥i(1) | Gi = 10,2] ) 
= (1—w) x [TjeH{¥i(1) |G; = 11,2} + moe E{¥i(1) | Gi = 01, 2} 
+(1 — we) x [Tooj2H{¥i(1) | Gi = 00,2} + moeE{Y¥i(1) | Gi = 10, x}], 
which is a weighted combination of the average potential outcomes under treatment across principal strata 


with weights depending on the proportions of the principal strata and the causal effects of the treatment 
on the mediator. Because 
Tila + Mole Toile — Moje _ E{Mi(1) — M;(0) | x} 


1— Wy= 1 = 
Tia + Tollx T11)2 + TOllx T11)2 + TOllx 


He gifs wal os OO + 701)2  Toijz — Moje E{M;(1) — M;(0) | os 
ee = = 
Too|a + Tole Too|x + Molex Tool + To|x 
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ss wecan further simplify the natural indirect effect as 


NIE(1 | 2) = E{M,(1) — M;(0) | 2} 


TMAlx TO1|x 
5g A | Ge a ee) GSO 
Tile + Tolle a0) } Tile + Tolle ta) } 
TOO|x TOO0|a 
ae le eye (0b a) a rye Se 
Toole + TMo|x et) } Too|a + Tole (0) | } 


= E{M,(1) — M,(0) | 2} x [E{¥i(1) | G; = 11 or 01,2} — E{Y;(1) | G; = 00 or 10, 2). 


Similarly, we can prove the result for NIE(0 | x). 


82:5. Proof of Proposition 4 


Consider the results in Proposition 3. For G; = 11 or 01, we have M;(1) = 1, and for G; = 10 or 00, 
we have M;(1) = 0. If we invoke the potential outcomes with double index Y;(z,m) and use Y;(z) = 
so Y;(z,M;-), then we can rewrite the results in Proposition 3 as 


NIE(1 | 7) = E{M;(1) — M;(0) | x} 
x(E{Y;(1, 1) | G; = 11 or 01,2} — E{Y;(1,0) | G; = 00 or 10, x}, 
NIE(O | 7) = E{M;(1) — M;(0) | x} 
x(E{Y;(0, 1) | G; = 11 or 10,2} — E{Y;(0,0) | G; = 00 or 01, x}. 
Therefore, the proofs of (8) and (9) follow directly from applying the homogeneity assumptions 
Y,(1, m)1ILG; | X; and Y;(0, m)1LG; | X;, respectively. 


$2-6. Proof of Proposition 5 
We need a lemma to prove Proposition 5. 


65 LEMMA S1. Consider a general random variable R, and two binary random variables R, and Ro 
satisfying monotonicity Ry > Ro. The following independence relationships are equivalent: 


RILR, and RILRo = RIL(R, Ro) = RALR, | Ro and RILRo | Ry. 
Proof of Lemma S1. We need only to prove that 
RILR, and RILRo = RIL(R, Ro), 


because other implication relationships are straightforward. 
From R1LR we have pr(R | R; = 1) = pr(R| Ri = 0), which can be decomposed as 


pr(R| Ri = 1, Ro = 1)pr(Ro = 1] Ri = 1) +pr(R | Ri = 1, Ro = 0)pr(Ro = 0| Ri = 1) 
= pr(R | R, = 0, Ro = 1)pr(Ro = 1| Ry = 0) + pr(R| Ry, = 0, Ro = 0)pr(Ro = 0| R; = 0). 


Monotonicity R, > Ro further simplifies the above equation to 


pr(R | Ri = 1, Ro = 1)pr(o =1| Ri = 1) +pr(R| Ri = 1, Ro = 0)pr(Ro = 0| Ri = 1) 
= pr(R | Ri = 0, Ro = 0). (S1) 
70 Similarly, from RI Ro we have pr(R | Ro = 1) = pr(R | Ro = 0), which can be decomposed as 
pr R Ry = 1, Ro = 1)pr(R, =1 | Ro = 1) +pr(R | Ry = 0, Ro = 1)pr(R, =0 | Ro = 1) 
= DE R Ry = 1, Ro = 0)pr( Ri =1 | Ro = 0) +pr(R | Ry = 0, Ro = 0)pr( Ri =0 | Ro = 0). 
Monotonicity R; > Ro further simplifies the above equation to 
pr R Ry => 1, Ro = 1) (S2) 
= pr R Ry = 1, Ro = 0)pr( Ri =1 | Ro = 0) +pr(R | Ry = 0, Ro = 0)pr( Ri = 0 | Ro = 0). 
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Replacing pr(R | Ry = 0, Ro = 0) in (S2) by its expression in (S1), we have 
pr(R | Ry = 1, Ro = 1) 
= pr(R | Ry = 1, Ro = 0)pr(Ri = 1 | Ro = 0) 
+ {pr(R | Ry = 1, Ro = 1)pr(Ro = 1| Ri = 1) +pr(R| Ri = 1, Ro = 0)pr(Ro = 0| Ri = 1) 
xpr(R; =0| Ro = 0). (S3) 
Combining the terms involving pr(R | Ry = 1, Ro = 1) and pr(R | Ry = 1, Ro = 0) respectively, (S3) 
above implies 
pr(R | Ry = 1, Ro = 1) x {1—pr(Ro =1| Ri = 1)pr(Ri = 0| Ro =0)} (S4) 
= pr(R | Ry = 1, Ro = 0) x {pr( Ri = 1 | Ro = 0) + pr(Ro =0 | Ry = 1)pr( Ry =0 | Ro =0)}. 


Because 
{1 —pr(Ro =1| Ri = 1)pr(Ri =0| Ro = 0)} 
— {pr(Ry =1| Ro = 0)+pr( Ro = 0 | R; = 1)pr(R; =0| Ro = 0)} 
= 1-—pr(R; = 1| Ro = 0) — pr(Ri = 0| Ro = 0) = 0 
implies 


1—pr(Ro =1 | Ry = 1)pr(R, =0 | Ro = 0) 
pr( Ry 1 | Ro 0) + pr(Ro = 0 | Ry = 1)pr( Ry =0 | Ro = 0). 
Therefore, (S4) implies that pr(R | Ri = 1, Ro = 1) = pr(R| Ri = 1, Ro = 0). Replacing pr(R | Ry = 


1, Ro = 1) in (S1) by pr(R | Ri = 1, Ro = 0), we further deduce that pr(R | Ri = 1, Ro = 0) = pr(R | 
R, = 1, Ro = 1). Therefore, we have shown that 


pr(R | Ry =1, Ro = 0) =pr(R| Ry = 1, Ro = 1) = pr(R| Ri =0, Ro = 0). (S5) 
Because monotonicity R, > Ro rules out (Ri = 0, Ro = 1), the above relationships in (S5) imply 
R1IL(Ri, Ro). 
Proof of Proposition 5. Suppose that Assumption 3 holds. Then Y;(z,m)1L M;(z') | X; for all 
z,z',m = 0,1, because G; = {M;(z), M;(z’)}. Assumption 2 follows from 
pr{¥i(z,m), M,(2") | Z = 2’, Xi} = pr{Vi(z,m), Mi(z’) | Xi} 
= pr{¥i(z,m) | Xi} x pr{ Mi(z’) | Xi} 
= pr{¥i(z,m) | Zi = 2’, Xi} x pr{Mi(z’) | Zi = 2’, Xi}, 


where the first equality and the last equality follow from Assumption 1. 
Vice versa, suppose that Assumption 2 holds. Assumption | implies 
pr{¥i(z,m), Mi(2") | Zi = 2", Xi} = pr{Vi(z,m), Mi(2') | Xi} 
pr{¥i(z,m) | Zi = 2/, Xi} x pr{Mj(2!) | Zi = 2", Xi} = pr{¥i(z,m) | Xi} x pe{Mi(2') | Xi}, 
which, coupled with Assumption 2, imply that Y;(z,m)1L.M;(z’) | X; for all z, 2’, m = 0,1. Under As- 


sumption 4, M;(1) > M;(0), and therefore Assumption 3 follows from Lemma S1, with R = Y;(z,m), 
Ro = M,(0) and R; = M;(1), conditional on X;. 


82-7. Proof of Proposition 6 


Proposition 5 ensures that, under Assumption 4, Assumption 2 implies Assumption 3. We need 
only to show that Assumption 3 implies Assumptions 5 and 6. Assumption 3 can be written as 
Yi(z,m)11{M;(0), M;(1)} | X; for all z,m = 0,1, which further implies 


Yi(z,m)LLMi(0) | {Mi(1), Xi}, Yaz, m)LLMi (1) | {Mi(0), Xi}, 


80 
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and, in particular, with specific values of z, m, M;(0) and M;(1), 
Yi(1, 1)11M;(0) | {Mi(1) =1, Xi}, — Yi, 0) LLM (1) | {i(0) = 0, X;}. 


90 82-8. Proof of Proposition 7 


First, we prove the result for Y;(1, Mio). We can write its conditional mean given X; = x as a weighted 
average across principal strata: 


E{Y;(1, Mio) | x} = E{Y;(1, Mio) | G; = 00, x }troo|x + E{Y;(1, Mio) | G; — 01, ¢} Torx 
+E{Yi(1, Mio) | Gi = 11, 2} mje + E{Yi(1, Mio) | Gi = 10, ef m0) ($6) 


Under Assumption 4, 719), = 0 and other conditional probabilities of principal strata are identified by 
Til; = pr( Mors =1 |Z, =0,2) 
95 Too|2 = pr( Mes =0| 4, =1,2) 
Toe = 1— M12 — Toole = pr(M?°s =1| 24 =1,2) — pr( Maps =1| 2 =0,z). 
Identification of the conditional mean of Y;(1, Mio) within stratum G; = 00 follows from 


E{Y;(1, Mio) | Gi; = 00, 2} = E{Y;( | M;(0) = M;(1) = 0,2} 
= E{Y;( | M;(1) = 0, x} 

= E{Y,(1,Min) | Z; = 1, M;(1) = 0, 2} 
0 Adal fap ae Oe 


1, Min) 
1, Min) 


where the first equality holds because Y;(1, Mio) = Y;(1, Mj1) for G; = 00, the second equality holds 
because of Assumption 4, the third equality holds because of Assumption 1, and the last equality holds 
1o0 because of the composition and consistency assumptions. 
Identification of the conditional mean of Y;(1, Mio) within stratum G; = 01 follows from 


E{Yi(1, Mio) | Gi = 01,2} = E{Yi(1, Mio) | Mi (0) = 0, Mi(1) = 1,2} 
= E{Y;(1,0) | Mi(0) = 0, M;(1) = 1,2} 
= E{Y;(1,0) | Mi(0) = 0, M;(1) = 0, x} 
SPY | 2, 1M = 0a), (S7) 


where the first equality holds by definition, the second equality holds because Y;(1, Mig) = Y;(1,0) for 
G; = 01, the third equality holds because of Assumption 6, and the last equality holds because of consis- 
tency V9 = V,(7 Mo): 

105 Identification of the conditional mean of Y;(1, Mio) within stratum G; = 11 follows from 


1, Mio) | M;(0) = 1, M;(1) = 1,2} 
1,1) | Mj(0) = 1, Mi(1) = 1,2} 

= E{Y;,(1,1) | Mi(1) = 1,2} 

= E{Y,(1,1) | 2, =1,Mi(1) = 1,2} 

= E(Y°" | Z; = 1, Mo = 1,2), (S8) 


~~ 


where the first equality holds by definition, the second equality holds because Y;(1, Mio) = Yi (1,1) for 
G; = 11, the third equality holds because of Assumption 5, the fourth equality holds because of Assump- 
tion 1, and the last equality follows from consistency. 
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Therefore, we can use the above ingredients to simplify (S6) as 
E{Y;(1, Mio) | x} 
E(y2? | Z; = 1, MP’ = 0,2) x pr(Mo* = 0 | Z; = 1,2) 
+E(¥P* | Z; = 1, MP = 0,2) x {pr(Mg* = 1| Z = 1,2) — pr(Mp* =1| Z, = 0,2)} 
+E(¥2°s | Z; = 1, Mo? = 1, x) x pr(Mos = 1 | Z; = 0, x) 
= 50 EY | Zs 1M =m, 2) x pe( ME = m |Z, = 0,2). (S9) 


m=0,1 


Second, we turn to the identification of the natural direct effect NDE(0 | x): 
NDE(0 | x) = E{Yi(1, Mio) | 2} — E{Y¥i(0, Mio) | ©} = E{Yi(1, Mio) | x} — B{Yi(0) | x}, 


where the first term is identified by (S9) and the second term is identified by E(Y,°°s | Z; = 0,2) under 
Assumption |. 
Third, we prove the result for the natural indirect effect NIE(1 | x). The following decomposition 


NIE(1 | 7) = E{Y,(1, Mix) | e} — E{Yi(1, Mio) | x} 

= [F{¥i(1, Mi) | Gi = 01,2} — E{Yi(1, Mio) | Gi = 01, 2}] x To1J0 (S10) 
holds under Assumption 4 because Y;(1, Mii) = Yi(1, Mio) for strata G; = 11 and G; = 00. We can 
use (S7) to identify E{Y;(1, Mio) |G; = 01,2} in (S10) and use the following result to identify 

E{Yi(1, Mi1) | G; = 01,2} in (S10): 
E{Y;(1, Mi) | G; = 01, v2} = E{Y;(1, 1) | M;(0) = 0, M; 
= E{Y;(1,1) | M.(0) = 1, M; 
= EY" | Zi= 1, Mps raz 1,2), 


where the first equality holds because Y;(1, Mj) = Y;(1,1) for G; = 11, the second equality holds be- 
cause of Assumption 5, and the last equality follows from (S8). Therefore, (S10) reduces to 


NIE(1 | x) ={E(¥?°s | Z; = 1, Ms = 1,2) — E(¥* | Z; = 1, Mos = 0, x)} 
x {pr(Mo>s = 1 | Z; = 1,2) — pr(Mo™s = 1 | Z; = 0, 2)}. 


$2:9. Proofs of Propositions S1 and S2 


The proofs are similar to the ones of Propositions 6 and 7. 
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